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Abstract — In this paper, we consider the problem of distin- 
guishing the noisy codewords of a known binary linear block code 
from a random bit sequence. We propose to use the generalized 
likelihood ratio test (GLRT) to solve this problem. We also give 
a formula to find approximate number of codewords required 
and compare our results with an existing method. 



I. Introduction 

IN blind reconstruction of an error correcting code, the aim 
is to reconstruct the underlying code from noisy version 
of transmitted codeword sequence without the knowledge of 
the parameters of the code. For example, this problem arises 
in cognitive radios or spectrum surveillance applications. This 
problem was first introduced by Planquette [ 1 1 for linear block 
codes. Valembois proved this problem to be NP-complete 
(2). In spite of NP-completeness, Valembois [2|, Cluzeau [3 | 
et. al. have suggested various algorithms which make use 
of information set decoding techniques, such as given by 
Canteaut et. al. [4]. Sicot, Houcke, Barbier [5], Burel, Gautier 
|6| have suggested algorithms which make use of Gaussian 
elimination process. 

In this paper, we consider the problem of distinguishing the 
noisy codewords of a known binary linear block code from a 
random bit sequence. This problem was proposed by Chabot in 
J7J- The main challenge in this problem is that the codewords 
which are transmitted are not known to the receiver. The 
solution proposed in [7| addresses this challenge by computing 
the inner product of the received bit sequence with codewords 
in the dual code. The difference in the distributions of the inner 
product values in the presence and absence of the codewords 
in the received bit sequence is used to solve the detection 
problem. 

In this paper, we propose a new method which makes use 
of the generalized likelihood ratio test (GLRT) [8] to solve 
the code detection problem. The GLRT addresses the issue 
of the unknown codewords by first estimating them using 
maximum likelihood decoding and then using the estimates 
perform a threshold test. The problem formulation is presented 
in Section [Tj] In Section [Til] we derive the GLRT structure and 
distribution functions for threshold testing. In Section [TV] we 
design a threshold test based on Neyman-Pearson criterion and 
sequential detection method. We also give a formula to find 
approximate number of codewords required to achieve a given 
performance. Performance results of the proposed method and 
a comparison with an existing technique are presented in 
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Section [V] followed by some concluding remarks in Section 

eh 

II. Problem Formulation 

We are faced with a binary hypothesis testing problem 
where the null hypothesis Hq corresponds to the situation 
when the observed bit sequence is independent and identically 
distributed (i.i.d.) bits with each bit equally likely to be zero or 
one. The alternate hypothesis Hi corresponds to the situation 
when the observed bit sequence is the result of passing M 
unknown codewords of an (n, k) binary linear block code C 
through a binary symmetric channel (BSC) having crossover 
probability p. Let the observed bit sequence of length Mn be 
given by Y e F^™. The binary hypothesis testing problem is 
given by 

Ho : Y is random bit sequence of length Mn 
Hi : Y = V + E = [Vi V 2 • • • V M ] + E 

where V £ ¥f n such that V 4 e F£ is a codeword in C 
and E <E F^™ is the error vector induced by the BSC having 
crossover probability p < h. The entries of E are i.i.d. taking 
value one with probability p. 

Under the null hypothesis Hq, every vector y £ F| /n is 
equally likely and hence the probability mass function (pmf) 
of the observed vector is given by 

1 



Po(y) = 



(1) 



Under the alternate hypothesis Hi, the pmf of the observed 
vector depends on the unknown codewords transmitted and is 
given by 

pi(y;V) = p d «(y> v )(i- p ) M «-<My> v ) (2) 

where dij(y, V) is the Hamming distance between the vectors 
y and V. 

III. Generalized Likelihood Ratio Test Structure 

We propose to use the generalized likelihood ratio test 
(GLRT) to deal with the problem of the unknown codewords. 
In this approach, the pmf of the observed vector under the 
alternate hypothesis will be calculated by substituting the 
maximum likelihood (ML) estimates of the codewords. The 
GLRT statistic for the detection problem is given by 



A(y) 



pi(y; Va/l) 
po(y) 



For BSC, calculation of the ML estimates will involve 
finding the codewords which are nearest in Hamming distance 



2 



codewords 



UU---0 


v 2 




Vj 




V 2 fc 




e 2 + v 2 




e 2 + Vi 




e 2 + w 2 fc 
















ej + v 2 




ej + Vi 




+ w 2 fc 














e 2 ™-k 













I 

coset leaders 

Figure 1, The general structure of a 2 n ~ k X 2 fc standard array 

to the received vectors [9]. The GLRT decides that Hi is 
true if A(y) exceeds a threshold and decides that Ho is true 
otherwise. For a threshold T, this can be represented by 

-Hi 

A(y) | T. 

Ho 

Since po(y) does not depend on y and px(y; ~Vml) is a 
monotonically decreasing function of G?#(y; Vj\/i), the GLRT 
can be simplified to the form 

Ho 

dff(y,V ML ) | r. (3) 

-Hi 

To find the optimal threshold T opt using hypothesis testing 
methods, we need to characterize the pmf of the GLRT statistic 
djj(Y, Vjnj,) under the two hypotheses. The GLRT statistic 
can be written as 

M 

"H dff(Yj, Vj). 

i=i 

In fact, the random variables in the sum on the right hand 
side are i.i.d. since all codewords are independent. If we can 
obtain the pmf of one of the random variables in the sum, we 
obtain the pmf of the sum as the M-times discrete convolution 
of the individual pmf. Without loss of generality we now find 
the pmf of dn (Yx, Vx) under both the hypotheses, where Vi 
is the first codeword. We consider standard array ML decoding 
technique to find these pmf's. 

A. Standard Array Decoding and Coset Weight Distribution 

In standard array, the set of all possible 2™ n-tuple received 
vectors is partitioned into 2 k disjoint subsets each having 2 n ~ k 
vectors such that all the vectors in a subset are closest to a 
particular codeword in C. The general structure of any standard 
array is shown in Figure Q] More details can be found in |9] . 

Weight distribution of a code C is defined as the set of 
numbers {A,}, where Aj is the number codewords of weight 
i, < i < n (9). Weight distribution of any row in a standard 
array and weight distribution of coset leaders is also defined 
in the same way. All coset leaders and weight distribution 
of rows corresponding to these coset leaders form the coset 
weight distribution of the code. 



Since we assume that the code is known, the coset weight 
distribution of the code can be found out. We consider this as 
a pre-calculation phase. 

B. GLRT Statistic Distribution under the Null Hypothesis 

When the null hypothesis H$ is true, the received vector Yi 
is equally likely to be any vector in F 2 . It takes a particular 
value with probability 

If the received vector Yi falls in the first row of the standard 
array, it is equal to a codeword in C and the ML estimate is 
Vi = Yi. In this case, <iff(Yx,Vx) is equal to zero. Thus 
we have 

Pr[d H (Yi,V 1 ) = 0; J ff ] = | r (4) 

since there are 2 k vectors in the first row of the standard array. 

If the received vector Yi falls in some row other than the 
first row of the standard array, it is equal to sum of the coset 
leader e of the row and the codeword v at the top of the 
column it falls in i.e. Yi = e + v. Since ML estimate Vx is 
equal to the codeword at the top of the column v, we have 

dff(Yi, Vi) = d H (e + v, v) = w H (e) 

where wh(&) is the Hamming weight of the coset leader e. 
Let j3j denote the number of coset leaders having weight j. 
The weight distribution of the coset leaders consists of the 
numbers /3q, 0i, . . . , j3 n . If the received vector falls in any of 
the Pj rows having coset leaders of weight j, dn (Yi, Vx) will 
take the value j. In terms of the coset leader weight distribution 
we have 

Pr[d ff (Y 1 ,V 1 )=j;ff ] = ^, (5) 

for < j < n, since each of the rows have 2 k vectors in 
the standard array. 

Let qo(j) — Pr[dff(Yi, Vi) = j;Ho] denote the pmf of 
<iff (Yi, Vi) under the null hypothesis Hq. Given the pmf of 
each of the i.i.d random variables in the sum on the right 
hand side of Equation @, the pmf of the GLRT statistic 
dff(Y, Vml) can be obtained as 

Qo(j) = 90*90 *---*£foC?')) (6) 
^ v * 

M times 

for < j < Mn, where * denotes the convolution operator. 

C. GLRT Statistic Distribution under the Alternate Hypothesis 

Suppose the alternate hypothesis Hi is true. The received 
vector Yi is equal to the sum of the transmitted codeword 
Vi and the error vector Ei € F 2 induced by the BSC. As 
discussed in Section IIII-BI the statistic d^ ( Yx , Vx ) is zero if 
the received vector Yx falls in the first row of the standard 
array. This is possible if and only if the error vector Ex is equal 
to a codeword in C. Let A, be the number of codewords in 
C having weight i. The probability that djj(Yx,Vx) is zero 
is given by 

Pr[d H (Yi,Vi)=0;#i] = Pr[Ex G C\ 

n 

= ^A^(l-p)"-* (7) 

i=0 
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Note that this probability does not depend on the transmitted 
codeword V\. 

Let ej be the coset leader of the jth row in the standard 
array. Then the set of all vectors in the jth row of the standard 
array is given by ej + C. The probability that the received 
vector falls in the jth row of the standard array is given by 



PrfYi e e,- + CI = 



Pr[V a + Ei e ej + C] 
Pr[Ei 6 e s + C] 

n 

E^Va-p)"^ 

i=0 



(8) 



(?") 

where B\ is the number of vectors in the jth row with weight 
i. The sequence B a , B\ ' , . . . , B„ is called the coset weight 
distribution of the jth row in the standard array. Let Si C 
{1,2,..., 2 n_fe } be the set of rows in the standard array whose 
coset leaders have weight I. Then we have 



Pv[d H (Y 1 ,Y 1 ) = l-H{\ = £ E B FV(1 -p)*-* (9) 

jes, t=o 

for < I < n. Note that the above probability does not depend 
on the transmitted codeword Vi. Since the first row in the 
standard array is the only row having a zero weight coset 
leader, we have Sq = {1}. We also have B. = A\ since the 
coset in the first row of the standard array is the code itself. 

Let q 1 (j) = Pr[d ff (Yi, Vi) = 3; Hi] denote the pmf 
of djj(Yi,Vi) under the alternate hypothesis Hi. From 
Equation ©, the pmf of the GLRT statistic dji(Y, V ml) can 
be obtained as 



Qx[j) = qi * qi * ■ ■■ * qi(j), 



(10) 



M times 

for < j < Mn, where * denotes the convolution operator. 

IV. Threshold Design For The GLRT 

Using Equations (0), (0, (O and (O we can find pmf of 
dff(Y, V) under both the hypotheses. The problem is now 
to find an optimal threshold T opt in Equation ((3). We apply 
Neyman-Pearson hypothesis testing method to find T opt . We 
also apply sequential detection method. 

A. Setting the Neyman-Pearson Threshold 

According to the Neyman-Pearson criterion the optimal 
threshold is given by 



'opt 



argmax -Pd(t) under the constraint Pf{t) < a 



where a is the bound on the probability of false alarm. And 
the optimum decision rule is 

1) Decide Hi is true if djj(Y, ~Vml) < T opt- 

2) Decide Hi is true with probability r\ if d#(Y, *Vml) = 

1~opt ' 

3) Decide H is true if d H (Y, V M l) > r opt . 

Here 77 and r opt are chosen such that PF(r opf ) = a. The 
randomization in the decision rule is necessary because of 
the discrete nature of the GLRT statistic which may prevent 



the false alarm probability from being equal to a when a 
nonrandomized decision rule is used. 

The probability of false alarm PF(r opt ) is given by 



PF(T opt ) 



Pr[d H {Y,V ML ) < T opt ;H ] 
+i]Pi[d H (Y,V M L) = Topt, 



H ] 



E 

j<T opt 



Qo(j) + vQo{T op t), 



(ID 



where Qo(T opt ) = if T opt is not an integer between and 
Mn. The probability of detection PD(r opf ) is given by 

P D {To P t) = Pr[d H (Y, Vml ) <T opt ;H 1 ] 

+ V Pr[d H (Y, V ML ) = r opf ; H{\ 
= E QiU) + riQi(r opt ), (12) 

j<Topt 

where Qi(r opt ) = if r opt is not an integer between and 
Mn. 

To set the optimal threshold, find the largest integer i 
between and Mn such that J2j<i Qo(j) < a an d set T opt = 
i- If Ej <Topt Qo(j) = a, set n = 0. If E J<Topt Qo(i) < a, 
randomization will be required in the decision rule and setting 

a - Ej<r opt Qo(J) 



v = 



(13) 



Qo(Topt) 

will result in the false alarm probability being equal to a. 

B. Approximate Number of Codewords Required 

Define a random variable X\ = djj(Y^,Vj), fori = 
1,2, ...,M under hypothesis Hj, for j = 0,1. Since the 
Yi's are independent, the Xf's are i.i.d. with pmf given by 
Equations (01), (|5]), © and (O with mean pj and variance <j|. 

Define a random variable X J = X{ + X J 2 + . . . + X J M 
corresponding to <if/(Y, V). From central limit theorem, the 
distribution of can be approximated by a Gaussian 

distribution with mean pj and variance cr|. Let denote 
cdf of a Gaussian random variable with mean u and variance 
er 2 , where $(2;) is cdf of standard Gaussian random variable. 

Now we know, 



PF(r opt ) =Pv[^d H (Y,V ML ) 



< T. 



P D {ropt) = Pr[—d H {Y,V ML ) < r' opt 



opt-, Ho] — a 
■H 1 ]=I3 



where r' opt 



M T opt- 



From central limit theorem we have, 



T opt - u 



opt 



M 
- Hi 



ai/VM 

Solving above two equations for M we get 



(14) 

ui ~ Mo 

Using Equation (fl4l i the approximate number of codewords 
required can be found for a given a and (3. 
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C. Sequential Detection Method 

Neyman-Pearson method is a fixed sample method i.e. the 
number of codewords M are fixed. In the sequential detection 
method, the number of codewords M s are varied to achieve 
a specified a and (3 [10|. Thus the number of samples M s is 
now a random variable. 

Let us denote the pmf's qo(j) and qi(j) given by Equa- 
tions ©, ©, and © by 

Qo (j) = [ro ri ••• r n ] 
Qi(j) = [ s o H ■■■ s n ] 

where rj is Pt[cIh (Yj, Vj) = j;Ho] and similarly for sj. 

Now consider a sequence of djj(Yj, V,) corresponding to 
received codeword sequence. Let a random variable Dj indic- 
ate the number of times Hamming distance j was observed 
in this sequence. Thus the vector D = (Do, 
multinomial distribution with parameters 
under hypothesis Hq and with parameters 
under hypothesis Hi. 

The likelihood ratio A TO is given by 



, D n ) follows a 
r\ ■■■ r % 

Sl ■ ■■ S,; 



A77 



Tq° ■ ■ . . . ■ r^" 
According to iflOl . the decision rule is as follows 

if B < X m < A, take additional codewords 
if A m > A, accept H%, terminate the process 
if A m < B, accept Hq, terminate the process 

where the boundary points A, B are given by 
anu B = 

a 



A=? 



1-a 

From [8], the expected number of codewords M s required 
under hypothesis Hq and Hi for sequential detection method 
are given by 

"a 

r — 

'a 



E{M S \H Q } 
EiMJH^ 



1 

l 



(1 — a)lo; 
(1 - /?)log 



1 — a 
1-/3 



1 



a 



alog- 

< 

/31og^ 



(15) 



It can be shown that, 

n n 

So = nlog— and Si = V* sjog — 

i=0 i=0 

Using Equation ( fT5l ), the expected number of codewords can 
be found for a given a and (3. 

V. Performance Results 

A. Performance of GLRT method 

In this section, we present the performance of the GLRT 
based code detection scheme for the (7, 4) Hamming code 
when Neyman-Pearson method is applied. For a — 0.05, the 
probability of detection PoiTopt) f° r me (7, 4) Hamming code 
is plotted in Figure [2] as a function of the number of noisy 
codewords observed M for different values of p. For each 



value of M, the pmf Q is used to set the threshold r opt and 
the randomization parameter r\. The probability of detection is 
obtained using Equation ( fl2] i. 




10 15 
Number of noisy codewords M 



Figure 2. The probability of detection Pp (r op t ) as a function of the number 
of noisy codewords observed M with a = 0.05 for the (7, 4) Hamming code. 

For p = 0.1, the receiver operating characteristic (ROC) 
is shown in Figure [3] for different values of M, The ROC is 
piecewise linear with changes in slope at a — J2j<i Qo(j) 
for < i < Mn. For a e E J<( QoO"), E^i+i QoCi)), 
the optimal threshold will be chosen to be equal to i and the 
slope of the ROC is Qi(i) (see Equation (fl2l i). As one would 
expect, the shape of the ROC becomes more favorable as the 
number of noisy codewords observed increases. 




0.1 0.2 0.3 



Figure 3. The probability of detection Pu( T opt) as a function of a for the 
(7, 4) Hamming code with p = 0.1. 
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B. Comparison of GLRT Method with Chabot's Method 

We now compare our method with method proposed by 
Chabot Q with respect to number of codewords required to 
achieve same performance. We use Equation ( TT4l > to find num- 
ber of codewords required by our method. The Table U shows 
a comparison for various codes for a = 0.05, j3 = 0.997 and 
for various values of p. Here, Hamm(n, k) denotes Hamming 
code and RM(n, k) denotes Reed-Muller code. Coset weight 
distribution of RM(64, 22) is taken from ifTTI . 







No. of Codewords 


No. of Codewords 


Linear Block Code 


P 


Required by 


Required by 






GLRT method 


Chabot's Method 


Hamm(31,26) 


0.05 


61.50 


550.42 




0.07 


183.01 


2397 


Hamm(63,57) 


0.05 


560.31 


16371 




0.07 


6.19xlO J 


3xl0 b 


Hamm(127,120) 


0.05 


1.19xlO b 


1.39x10' 




0.07 


3.70x10' 


4.68 xl0 M 


RM(32,16) 


0.1 


9.25 


674.12 




0.15 


40.07 


5800 


RM(64,22) 


0.1 


49.55 


2.44x10* 




0.15 


1.35xl0 a 


1.75xl0 b 


BCH(15,7) 


0.1 


10.39 


102.83 




0.15 


29.12 


322.91 


BCH(31,16) 


0.1 


10.67 


674.12 




0.15 


46.52 


5800 



Table I 

Comparison of Number of codewords required by GLRT method 
with Chabot's Method 



It can be seen from the Table|T]that the number of codewords 
required by GLRT method are considerably less than than 
that of required by Chabot's method. But the challenge in 
GLRT method is finding the coset weight distribution of the 
code. Hence the GLRT method is best suited for the codes of 
moderate length or when coset weight distribution of the code 
is known. 

C. Comparison of Neyman-Pearson and Sequential Detection 
Method 

We now compare the number of codewords required by 
Neyman-Pearson method denoted by M with that required 
by sequential detection method denoted by M s for the same 
value of p, a and f3. TablelLTlshows a comparison for a = 0.05, 
p = 0.05 and for various values of (3 for Hamm(15, 11). 



p 


No. of Codewords 
Required by 
Neyman-Pearson method 


No. of Codewords 
Required by 
Seq. detection method 


0.5787 


5 


3.0665 


0.6953 


8 


4.2347 


0.7738 


10 


5.1228 


0.8980 


14 


6.7518 


0.9218 


17 


7.1081 


0.9561 


20 


7.6650 


0.9962 


35 


8.4460 


0.9973 


37 


8.4718 



Table II 

Comparison of Number of codewords required by 
Neyman-Pearson method with Sequential Detection method 



In Neyman-Pearson method, we first fix the number of 
codewords M. Then for a given a we find the decision rule 
which maximizes the probability of detection j3 as explained in 
Section HV-At while in the sequential detection method, for a 
given a and j3 we find the expected number of codewords M s 
required using Equation ( TTSb . It can be seen that the number 
of codewords by sequential detection method are less than that 
of Neyman-Pearson method. 

VI. Conclusion 
In this paper, we have derived a new method for detecting 
binary linear block codes in noise based on GLRT. The GLRT 
method involves ML decoding of the received bit sequence 
and performing a threshold test on the Hamming distance 
between the ML estimates of the codewords and the received 
bit sequence. In this work, we choose the threshold according 
to the Neyman-Pearson criterion and the sequential detection 
method. We observe that the number of codewords required 
by our method is considerably less when compared with the 
existing method. This method is suitable for codes of moderate 
length or when the coset weight distribution of the code is 
known. 

Note that in this method we have assumed that codewords 
are perfectly synchronized. The problem of detecting the first 
bit of the codeword is discussed by Sicot et. al. [12|. One 
future direction will be to extend this GLRT based method 
when codewords are not perfectly synchronized. 
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